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ABSTPACT 

Planned Variation was deisgned as a three-year 
program to assess the implementation cf prominent preschool curricula 
in Head Start and the immediate effects of the programs. Sites used 
were those in which the sponsor already had a Follow Through program; 
the research project lacked the necessary control over site 
characteristics* Consultants visited the sites monthly. The classroom 
observation form and observer rating scale were keyed to what the 
sponsors said distinguished their model. Consultants developed 
sponsor-specific checklists. Controversy over expected outccnf^es and 
selection of tests of cognitive development created additicral 
probleirs. It was found that statistical analysis could not compensate 
for the research design. Year 1 saw an emphasis on assessing 
implementation, the creation of the Classroom Observation instrument, 
the investment in creating new measures for years 2 and 3, the 
clinical case history and the consultant as innovations. Year 2 added 
a review panel for the project and increased the investment in 
developing new child and family measures. Year 3 added 
sponsor-specific studies, research for individual sponsors. Year ^* is 
for phasing out the sites, A summary is made of what was learned 
about evaluative research administration that may be applicable to 
similar studies, (KM) 
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This is a report noi: on the findings or tht results of tao national 
aead Start/rollow Through Planned Variation Scucy, but on the ^valuation 
icselr: what was done, how the study was conducted, why did W3 do what 
we did, the shortfalls m rcethodology > approach and evaluation management, 
and the methodological advances. This last is c-n important question, 
since the atudv is aiuong the inost costly pieces of educational research 
conducted recently, and one * product'' is the Icc-rning about evaluation 
methoGS ana canagexaent we clairi has occurred. 

Soioe definitions first; Planned Variation is a research study in- 
tended to determine which of several outstanding, early childhcod curricula 
have zae greatest immediate effects in Head Sta-t, and whether participa- 
tion ^n well-planned, well-iriiplemented, continuous programs weald yield 
continuous development in the children. 3y grace of the demands of the 
Westmgnouse Report, the then Bureau of the Bud jet, and our ova concerns 
at Head Start with program improvement, the stu.iy was designed to explain 
^^^^ ^ frequently occurring pnenornenon: the curve shown in Figure 1. The 
^^"y^ curve shows an immediate impact of a preschool .intervention, a catch-up 

'^y tCi^ contrcl group after sch. al entry, and a gradual decline in achievement 



iPcipar presented at the National Association for the Education of Young 
jTjjj^ Childreii Conference, November, 1972. 



oi bc^/* experin;ental ana control groups afcer ^ chird or fou'^'th grade. 
What program would yield the greatest effects or what irieasure5 for what 
children? ."laybe the "right" program would have a lasting eifait? 
would continuity of experience in any curriculum that was vel*-planned 
a^id supervised have sustained effects? 

As previously noted, the study originated loth in a concern for 
Head Scart program iniprovement through incorpor^^tion of effective new 
curricula into the daily program, and the need lo "justify" preschool 
interver.cion as public policy by the magnitude ^^nd durability of its 
benefits. Such a statement assuiaes that a happ'^, healthy, "good" 
experience for low-income youngsters would popularly justify public in- 
vestment only if there were long-terms gains in matters which are of 
public, social concern such as academic achieveiient . 

This is a value- issue that generates consitierable heat. The concerns 
of tfiC 50* s and 60*s with inequality of educational opportunity stemmed 
from CL belief that education and later economic status were related. The 
high rate oi school failure and low achievement on standard tests of 
reading, arithmetic, and in older grades, of la/iguage and qiiantitacive 
comprehension and problem-solving, were e -demic among the poor, and 
particularly among poor blacks. Thus put lie exrsectancy that preschool 
progr<ims ought to have a durable effect on acad*jtnic achievement if public 
funds are to be spent on income-segregated programs for which the working 
marginal poor and lower middle class are not el:.gible, is not an unreason- 
aole expectation. On the other hand, blacks anu whites of equal academic 



achievement have i^necual ir.cories (which places, ^ne bia*^e ^or economic 

inequities on ocher shoulders than the schools per 5e) and the 1954 

personal 

Brovn decision was predxcaced on the^sense of inequality and unworthiness 
assumed to be prepetuatcd by segregated public institutions. Thus public 
expectancy for preschools could well be limited to delivery of services 
(which most agree are well-provided by Head Start to enthusiastic parents 
and chilaren>, ana inmediate socialization benefits, ^ince mostly low-- 
income children attend rederally-supported preschools, however, the in- 
practice exclusion of working poor and lower-micdle class from Head 
Szart has probably raduced the strength of the i.econd argument for many 
caxpavers v^7ho can not afford preschools for the:_r own children. Thus, 
the academic achievement issue is prominent in aecisions on whether public 
fuaas support: one program or another for children. Qhe VJestinghouse Study, 
ana our own smaller-scale longitudinal studies did not shov; durable 
academic effects in most ..ircumstances: would a good Follow Through 
program linked to a x;Qod Head Start have the continuity of effects ex- 
pected when Follow Through was funded? And wou^d the Head Start 
experience be a necessary experience or could entry into the program be 
delayed until Follow Through with no apparent unreversible deficit? 

It should be no^ed at this time that the "effects'' required are not 
limiced^tor the IQ by some conspiracy. Motivat:.Dnal changes, social 
adjustment, positive self-image, sense of hope and self-worth, better use 
of basic abilities, achievement in school as measured by any appropriate 
instruiae.it — the responsibility for defining aau measuring the outcomes 



which are educationally significant co a great extent rest witr. us, not 
with son^e mythical group who are bedazzled by I s, The policy-zakers 
to whos 1 have talked are far nore interested ir. achieve»aent and compe- 
tence than IQ. We, the researchers, haven't delivered evidence on these 
variables, and we, not Congress or 0MB, selectee IQ as a reliable, 
meaningful proxy for other events. It is more an instance of 
"put up or shut up'' than of crucifying children on the cross oi IQ. No 
one I tinow — parents, teachers, researchers, policy-makers—wan-s to do 
this. But, in pracrice, unfortunately, there are few measiires which are 
reliable, meaningfully intorreiated, and feasible except the standardized 
tests, ana this despite prolonged larga investments in developing other 
measures. 

A second point definition: by evaluative research I mean an 
asseSoiTient of (1) what was the treatment or the program, (2) did the 
treatu^ent or program have the effects it intended to have, and (3) how 
did a-fferent treatments or programs compare in the extent to which they 
reached their own goals (criterion-referenced e^^aluation) and transfer 
to broader goals? The Planned Variation Study \/a3 not experimantal in 

of 

the sense ^control by the researchers of the treatment and who received 
it; it was a quasi-experimental evaluative research study with limited 
ability to control who received treatnient or ho^j many replications could 
be located where. 

In discussing Planned Variation as a quasi-experimental study, I 
will consider first the research design, aaasuras and analytic approach, 
and tnen discuss que^-tions of research manageme.it and research utilization* 



yiariTied Variation was designed as a three-y^ar progras to assess the 
irn>ler.entation of pror^r.ent preschool curricula, and the lisaadiate effects 
of the different prograr:s. The curricula were cr^oser* in all bi.t two 
instances because chese prograzis were installed :.n Follow Through during 
1967-69, and already had extensions downward to che preschool years. A 
three-year program vaj» planned froE the beginning because we e>:pected 
that sponsors would have co train staff and learr* how to operate in Head 
Start, and we wanted to assess both the ease of Irjplenentation of different 
models, and their effects after a reasonable tizia for them to tecoiae fully 
operational. /This was, and for sonie studies, still is an innovation in 
national studies. The performance contracting eKperiaient, for example, 
gave only one year for the final test of program: cost /effectiveness. The 
Experiir^ntal Schools program, on the other hand, began with a five-year 
iniplerientation period. We now believe that more than one year is essentia^ 
but also that tine per se is likely to be no guarantee of ''ideal" implemen- 
tation in part because of staff turnover everywhsre — (sponsors, trainers, 
teachers, evaluators) and because programs are affected by many winds of 
change besides those of the curriculum model (funding hassles, hassles over 
control, and other demands on the programs). jjTh^se were independently 
assessed by site visitors, but clearly, the best we could achieve was 
measurement of these other factors, and hopefull/, covariance. Again, 
wisdom is to measure and anticipate these uncontrollable influences on the 
program and to shout that what is being tested lan't the ideal, but what 
happen^ in c* complex real world. If we want to ^est the idea in its pure 
form, we need far morti control over these other -factors, the kind of 



contijl in fact ore obtains through program cevt^opaenc et^ort. 

Tr.e sites selected for PV were those in which the sponsor already 
hac a rollow Through prograis. This laeant that sronsor and geographic 
locat'.on and site characteristics were confounded since the Follow Through 
sites had not been selected to begin with to balance child age at entry, 
ethnicity, SES six, uibanicity, region, and other factors which can affect 
entry characteristics, iaplementatioa, program acceptability, and out- 
comes across sponsors. (This variability was not due to the inability of 
the very coinipeterit Follow Through directors to p.an a research study. 
In 1967-63, Follow Through was initiated as a na':ional program to serve 
all Head Start children. After a coclir^g of ur^^eres^ in 1968, Follow 
Througr. expansion was halted and the program tr«.isforr;ed into a national 
experiment, using the sites where programs had b^en started and a coiamlt- 
ment nade to the comunity and £taff.) 

What we have learned from grappling with tha resulting "design" is 
that no current statistical technique can co:apen3ate for this confounding; 
future studies which are asking the planned variation questions must have 
better lesearch control. This is, in fact, a general methodological 
finding: you can not put the statistical band-aid of regression analysis 
or pos£ hoc matching on a research design that has a broken leg and come 
up witn much more than hypotheses to be tested on a better day. [we will 
not learn much from early childhood research until we will confront the 
issue of service vs research, and research needs come first, at least if 
we want findings that can move programs out of limbo. Our country is 
littered with programs that are dying from indifference: the data aren't 
unfavorable enough to justify discarding them, c:ren't clear enough to show 
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how uo ru^wiiy c.*en"., or unequivocally fuvorabj.c e;.ough to f.ustitv expansion • 
The so^tr exception is Sesame Street , which expanded into The -n metric 
Conpany , and which combined a highly unifonn tret-tnent plus measurement by 
criterion-eferenced tests, plus laore laoney invested in Madison Avenue PR 
than Tzcst RctD jrograsss nave for developsient, pluL authorization to expand 
ccia2;erciaily into a self-supporting corporation, plus delivery of service 
to virtually all homes, more than 95% of which at all income levels have 
a TV set. 

Insofar as possible. Planned Variation required comparison of Head 
Starz children within the sites so the ef feet iver ess of the additional 
$350 per child costs of Planned Variation over ard above regular Head 
Start costs could be assessed. 'On-site controls have the research virtue 
of comparability and the research vice of prograr.. dispersion and contamina- 
tion. In many sites, there were no on-site Head Start coii5>arisons available, 
and we sought off-site comparisons which were rarely comparable, on-site, 
we had contSiaination. Some sponsors accepted the. research conductions; 
otners had as their agenda reaching every child they could. Even where 
sponsors cooperated with the research design, teacher meetings plus teacher- 
staff turnover meant contamination, [ho w substantial this was we will know 
when the 1968-69 data are analyzed. In some sites, there was a reverse 
effect: the experimental programs were not given their usual Head Start 
services and supplies because they were experimental, or there was rivalry. 
These design problems are not easily resolved: if one selects only larger 
sites to reduce contamination and still achieve vithin site comparisons, 
then the sample is atypical for Head Start. Alsc larger sites uay have 
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severa:. delegate agencies so the true conparabii^ty of program administra- 
tion is dubious. There are design options, such as paired sites assigned 
at random to £ and C conditions, but these take Lime and cooperation. 

True non-Head Start controls within sites were politically unacceptable 
to Head Start national and, I am told, to local staff. In my opinion, this 
is a research error that can not be coin>ensated I'or in terms of what we can 
say about the effects of Head Start and Planned Variation; the nature of 
the control group, and its incentives are a powerful determinant of 
"outcomes," and if comparison groups are "equally effective," there is no 
little aanger that "no difference" findings can be interpreted as "programs 
are equally ineffective," 

With regard to measurement, our approach was to invest heavily in 
describing what was actually happening. We have several techniques. Most 
innovative were educational consultants who visiued the sites monthly. A 
classroom observation form and observer rating scale keyed to what sponsors 
said distinguished their models was developed. '.In 1971-72, a sponsor- 
specific, structured, carefully developed checklist was completed by site 
visitor consultants. We had teacher, aide, director, and sponsor ratings 
of both overall classroom quality as a Head Star'^ program and inplementa- 
<^ tion as an examplar of the model. retrospect, this investment in 

description of the treatment was an immensely worthwhile decision; programs 
^^^^ Clanging and curricula were not monolithic. Implementation is worth 
studying in its own right and may be essential to analyses of data from a 
^ study of this kind. | For outcome measures for children and parents^ \:e 
spent many meetings, workshops, and conferences ::rying operationally to 

ERLC 
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deiine the outcoises anticipated by each sponsor, and to find reliable, 
feasible indicators for these outcomes. Some s>pcnsors had litc^e 
difficulty; for others (e.g., EDC) , there was no outcome for the child — 
the laessage was the ssedium or process. One noral is that only treatments 
which begin by being able to describe what they co, and what tney expect 
to have happen to children are suitable for comp^.rative curriculum studies. 

Despite these efforts to find good measures; Planned Variation nearly 
wrecked on che shoal of the Stanford-Binet : there are few reliable tests, 
and parcicipants in Planned Variation — consultants, sponsors, management, 
evaluators — held opinions varying from calling the Binet the crime of the 
centur;^ and brandies racist anyone who advocated its use to ase, who 
saia t.ien — ana still do — it's the most reliable, sensitive indicator we 
have of general cognitive development for a longitudinal study. After two 
years, che Binet was dropped to be replaced by nove criterion-referenced 
measures, and I am hoping that these prove sufficiently reliable to be 
interpreted. |The moral of this, if you will* is my concern that until the 
state of the art of measurement is iiiq>roved, comparative curriculum studies 
may be getting us waist-deep in the Big Muddy. If sponsors have central 
objectives we can not neasure adequately, then wa dare not place them in a 
horserace with sponsors whose objectives can be measured reliably unless 
the outcome criterion is ease of implementation or treatment drift, rather 
than child and family development. Comparison of sponsors who share common 
objectives which we can measure may be the current limit of corrparative 
curriculum studies. Perhaps it early childhood curriculum developers would 
use formative evaluation as vigorously as Sesame Street did anc could 



develop in che process crxterion-reierencec tests^^we wouia na^e greater 
progress on the issue of tne effectiveness of early intervention — for 
whon, and for what? 

The analyses in Planned Variation are directed primarily to this 
interactive question: what approaches have what affects on which children? 
Is there "one best" approach across all outcome iieasures and for all 
children? Are there "equally good" approaches? Or do some programs prove 
effective for sone outcomes but not others — a ^specificity of affect that 
seens to more than hinted at by existing data. Or may some programs 
have certain effects for some children but not ethers? From a policy 
viewpoin::, the neatest outcome would be either ' equally good" cr the "one 
best" approacnes. Fincing a specificity of effect will require considerable 
re-thinking of oar curricular models and developing sophistication on the 
part of program; cirectors, parents and teachers ,ln choosing outcomes wisely. 
Most complex would be educationally significant child x program x outcome 
interactions: chis finding, which is at the core of "the problem of the 
match" and much early childhood education belief, would require even more 
sophistication in individualization of instruction than we now have avail- 
able, except perhaps in extensions downward of i..p.i« 

A different methodological aspect is that tne SP.I and Hurcn analyses 
have identified analytic problems centering arour^d change or gfin scores 
in groups with different baselines to begin with and probably cifferent 
regression lines; comparison of magnitude of effects against scales which 
are not standardized to a common unit are equally perplexing fcr tests of 



interactions by outcornei,. A^^on^ Planned Varz^at:^:.'.* i/ r;et"nodolOcical con-- 
tributions shoula b^ identification of which of our thorniest problems can 
be solved with current statistical techniques and which represer.t essentially 
uanegotiable design requirements: cn what can researchers negotiate because 
alternative solutions are now available which will permit rigorous inference, 
and what represent unnegotiable demands if the outcome desired ..s rigorous 
inference about program effectiveness? 

Turning fro::i the What of Planned Variation to the How: we begin with 
three groups: evaluation, consultants, and case study reports. The 
evaluation contractor was responsible for designing the study (^anple size , 
etc.), for developing the instruisents ^ for fielding the national data 
collection effort, for analyzing the data and foi writing the reports. The 
teaa selected was Stanford Research Institute (SKI), because SRll was the 
Follow Through contractor. Economy of effort olis continuity seemed an 
obvious benefit cf this arrangetient . The second group was the consultants 
intended bo.-h as an extension of the Head Start officers respon3ible for 
prograia implementation C^r. Jenny Klein and Ms. ^uanita Dennis) and as an 
independent evaluation source of information on imp lament at ion. [The sense 
of a teati in decision-tiaking evolved during the itudy and was a creation of 
it, not a component planned from the beginning. In tfte second /ear, sponsors, 
consultants, OCD staff, and outside researchers formed a review panel which 
met fairly regularly to discuss the status of thu project and policy issues. 
This review panel approach was adopted for Home t»tart, with the addition of 
two parents, a model which \/nen involved from the beginning of 'Aome Start, 
has greatly strengthened the design. This also _s an innovation: to the 
best of my knowledge, no other Federal agency hac an on-going raview panel 



for Tiational evaluative research which includes researchers and consuniers. 
The panels stay with their program - \" final report in new studies 

in OCD, and, if 1 can, they will in NIE, too. 

The third part was a clinical case study o^' individual children that 
was created early one morning when Jenny Klein aid 1 shared a room and 
insomnia. After a long meeting on the merry-go-round of personal-social 
measurement, we still weren't happy with assessment and thus couldn't 
sleep. The idea of a clinical approach came parcly from my adiLiration of 
the work of Robert Coles and partlv from Jenny's background at the 
University of Maryland Child Study Institute where this was the method of 
choice. So as an experiment (because no one really knew how tc use 
clinical case data in a national study; it's easy to collect but there are 
almost no models for data reduction) the clinical case study was in from 
the beginning. 

Year I thus saw an emphasis on assessing implementation, the creation 
of the dlassroom ^servation instrument, the investment in crec^ting new 
measures for years 2 and 3, the clinical case history and the consultant 
as innovations. 

Year 2 added the review panel and substantially increased che invest- 
ment in developing new child and family measures. It also saw the 
separation of the data collection responsibility from the planr.ing and 
analysis responsibility. After considerable effort to obtain acceptable 
reports on time, we concluded that placing the responsibilities of plannii& 
field work, and data analysis on one contractor wasn't do-able. This is a 
conclusion to which I hold for longitudinal studies with high demands for 



new measures and non-standard aiaiytic technique^,, and with a cenand for 
yearly or nore frequent reports for natioaal relaase. In Spring 1971, 
H*.ron Institute became responsible for the Head Start Planned \ariation 
design and analysis, with SRI continuing responsibility for data collec- 
tion. 

Year 3, the final year of the study, thus bagan the consultants, 
with Huron Institute, with SRI, with the University of Marylanc, and the 
review panel as the principle components of the evaluation teair.. To 
this was added a new idea: the sponsor-specific study, which was a 
special set-aside for research which the sponsors might wish to do to 
augment the ocher efforts and to present to the public their program, 
and their accomplishments in their own way. Year 4 is a phase out year 
for the sites, as planned. Huron, SRI, and the sponsors are analy-ing 
data and preparing reports. In spring, under Huron's guidance, and with 
the help of the consultants, OCD will collect da^a on what program elements 
remain when program support is phased out. We also are concerned with the 
longitudinal study — with what happens when children enter Follow Through. 
This is another ".ory, with its own set of desiji, measurement, and policy 
issues and one still too much in process to write of. 

To sumraarize what we have learned about evaluative research administra- 
tion from the Head Start Planned Variation study that may be applicable to 
similar studies; 

• allow two years or more for implementation before a final 
program evaluation. 

• invest as much in studying the process of implementation 
and establishing the extent of implementazion as in 
studying outcomes. 



• select only treatments that are operationally defined, to begin 
with. 

• select treatments where (a) there is agreement begin with 
on what outcomes are to be reached (program objectives), and 
(b) where those outcomes can be reliably, feasibly measured 
prior to study initiation, 

• adopt multiple approaches to data collection: observation, 
consultant reports, testing, case studies, and others, 
allowing enough time to test out data reducation and in- 
terpretation before a large scale study is launched. 

• identify statistical non-negotiables in treatment, site and 
child selection, and stick to them if the outcome desired 
is rigorous inference about program effects. 

• involvement of a review panel of participants, including 
parents, from the beginning and throughout the project, is 
invaluable in preventing premature closure and providing 

a stability of vision and concern for the study. 

• separate data collection and data analys-S responsibilities 
( within a team approach, not sequentially), allowing about 
two years of data reduction and analysis for every year of 
data collection. 

• set aside funds for sponsor-specific studies and second 
generation resc *rch. 

• and, lastly, hope to be as fortunate as ve were in the 
hundreds of dedicated people who are wii:.ing to participate 
in research on be.half of children. 

Few who have worked with parents, children, and field data collector! 
can come away untouched by the intensity of whac Head Start as a gateway 
to a better life for children means to so many people. Far mere is in- 
volved than job scarcity or protection of narrowly economic self-interest 
in the hours and energy so many people have given to Planned Variation: 
consultants trapped in snow storms, researchers^get up at midnight for 
just one more computer run, community people focusing an almost palpable 



energy on learning the classroom observation cedes, teachers unlearning 



the old and trying to leam the new, ai.i most of all, the chiloren then- 
selves, whom I have seen and loved, and whose trast we bear. One NAEYC 
participant asked, "What does this mean for funding? For the children?** 
This is a question for which we are answerable v;ith our souls j.s we 
Lcport on the Planned Variation data, and leam from PV both methodologi- 
cal and programmatic lessons. 



